NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

JGS2: Near Second-order Converging Jacobi/Gauss-Seidel for GPU Elastodynamics

https://doi.org/10.1145/3731183

Lan, Lei; Lu, Zixuan; Yuan, Chun; Xu, Weiwei; Su, Hao; Wang, Huamin; Jiang, Chenfanfu; Yang, Yin (August 2025, ACM Transactions on Graphics)

In parallel simulation, convergence and parallelism are often seen as inherently conflicting objectives. Improved parallelism typically entails lighter local computation and weaker coupling, which unavoidably slow the global convergence. This paper presents a novel GPU algorithm that achieves convergence rates comparable to fullspace Newton's method while maintaining good parallelizability just like the Jacobi method. Our approach is built on a key insight into the phenomenon ofovershoot.Overshoot occurs when a local solver aggressively minimizes its local energy without accounting for the global context, resulting in a local update that undermines global convergence. To address this, we derive a theoretically second-order optimal solution to mitigate overshoot. Furthermore, we adapt this solution into a pre-computable form. Leveraging Cubature sampling, our runtime cost is only marginally higher than the Jacobi method, yet our algorithm converges nearly quadratically as Newton's method. We also introduce a novel full-coordinate formulation for more efficient pre-computation. Our method integrates seamlessly with the incremental potential contact method and achieves second-order convergence for both stiff and soft materials. Experimental results demonstrate that our approach delivers high-quality simulations and outperforms state-of-the-art GPU methods with 50× to 100× better convergence.
more » « less
Full Text Available
Local Gaussian Density Mixtures for Unstructured Lumigraph Rendering

https://doi.org/10.1145/3680528.3687659

Wu, Xiuchao; Xu, Jiamin; Wang, Chi; Peng, Yifan; Huang, Qixing; Tompkin, James; Xu, Weiwei (December 2024, ACM)

Full Text Available
XMM-SERVS X-ray extended Galaxy Cluster (XVXGC) catalog

https://doi.org/10.1051/0004-6361/202451064

Xu, Weiwei; Jiang, Linhua; Li, Ran; Luo, Bin; Brandt, William Nielsen; Zhang, Chaoli; Erben, Thomas (November 2024, Astronomy & Astrophysics)

Context. To explain the well-known tension between cosmological parameter constraints obtained from the primary cosmic microwave background (CMB) and those drawn from X-ray-selected galaxy cluster samples identified with early data, we propose a possible explanation for the incompleteness of detected clusters being higher than estimated. Specifically, we suggest that certain types of galaxy groups or clusters may have been overlooked in previous works. Aims. We aim to search for galaxy groups and clusters with especially extended surface brightness distributions by creating a new X-ray-selected catalog of extended galaxy clusters from the XMM-SpitzerExtragalactic Representative Volume Survey (XMM-SERVS) data, based on a dedicated source detection and characterization algorithm optimized for extended sources. Methods. Our state-of-the-art algorithm is composed of wavelet filtering, source detection, and characterization. We carried out a visual inspection of the optical image, and spatial distribution of galaxies within the same redshift layer to confirm the existence of clusters and estimated the cluster redshift with the spectroscopic and photometric redshifts of galaxies. The growth curve analysis was used to characterize the detections. Results. We present a catalog of extended X-ray galaxy clusters detected from the XMM-SERVS data. The XMM-SERVS X-ray eXtended Galaxy Cluster (XVXGC) catalog features 141 cluster candidates. Specifically, there are 53 clusters previously identified as clusters with intracluster medium (ICM) emission (class 3); 40 that were previously known as optical or infrared (IR) clusters, but detected as X-ray clusters for the first time (class 2); and 48 identified as clusters for the first time (class 1). Compared with the class 3 sample, the “class 1 + class 2” sample is systematically fainter and exhibits a flatter surface brightness profile. Specifically, the median flux in [0.5–2.0] keV band for “class 1 + class 2” and class 3 sample is 1.288 × 10⁻¹⁴erg/s/cm²and 1.887 × 10⁻¹⁴erg/s/cm², respectively. The median values ofβ(i.e., the slope of the cluster surface brightness profile) are 0.506 and 0.573 for the “class 1 + class 2” and class 3 samples, respectively. The entire sample is available at the CDS.
more » « less
Full Text Available
ScaNeRF: Scalable Bundle-Adjusting Neural Radiance Fields for Large-Scale Scene Rendering

https://doi.org/10.1145/3618369

Wu, Xiuchao; Xu, Jiamin; Zhang, Xin; Bao, Hujun; Huang, Qixing; Shen, Yujun; Tompkin, James; Xu, Weiwei (December 2023, ACM Transactions on Graphics)

High-quality large-scale scene rendering requires a scalable representation and accurate camera poses. This research combines tile-based hybrid neural fields with parallel distributive optimization to improve bundle-adjusting neural radiance fields. The proposed method scales with a divide-and-conquer strategy. We partition scenes into tiles, each with a multi-resolution hash feature grid and shallow chained diffuse and specular multilayer perceptrons (MLPs). Tiles unify foreground and background via a spatial contraction function that allows both distant objects in outdoor scenes and planar reflections as virtual images outside the tile. Decomposing appearance with the specular MLP allows a specular-aware warping loss to provide a second optimization path for camera poses. We apply the alternating direction method of multipliers (ADMM) to achieve consensus among camera poses while maintaining parallel tile optimization. Experimental results show that our method outperforms state-of-the-art neural scene rendering method quality by 5%--10% in PSNR, maintaining sharp distant objects and view-dependent reflections across six indoor and outdoor scenes.
more » « less
Full Text Available
Scalable Neural Indoor Scene Rendering

https://doi.org/10.1145/3528223.3530153

Wu, Xiuchao; Xu, Jiamin; Zhu, Zihan; Bao, Hujun; Huang, Qixing; Tompkin, James; Xu, Weiwei (July 2022, ACM transactions on graphics)

Full Text Available
Erroneous pixel prediction for semantic image segmentation

https://doi.org/10.1007/s41095-021-0235-7

Gong, Lixue; Zhang, Yiqun; Zhang, Yunke; Yang, Yin; Xu, Weiwei (March 2022, Computational Visual Media)

Abstract We consider semantic image segmentation. Our method is inspired by Bayesian deep learning which improves image segmentation accuracy by modeling the uncertainty of the network output. In contrast to uncertainty, our method directly learns to predict the erroneous pixels of a segmentation network, which is modeled as a binary classification problem. It can speed up training comparing to the Monte Carlo integration often used in Bayesian deep learning. It also allows us to train a branch to correct the labels of erroneous pixels. Our method consists of three stages: (i) predict pixel-wise error probability of the initial result, (ii) redetermine new labels for pixels with high error probability, and (iii) fuse the initial result and the redetermined result with respect to the error probability. We formulate the error-pixel prediction problem as a classification task and employ an error-prediction branch in the network to predict pixel-wise error probabilities. We also introduce a detail branch to focus the training process on the erroneous pixels. We have experimentally validated our method on the Cityscapes and ADE20K datasets. Our model can be easily added to various advanced segmentation networks to improve their performance. Taking DeepLabv3+ as an example, our network can achieve 82.88% of mIoU on Cityscapes testing dataset and 45.73% on ADE20K validation dataset, improving corresponding DeepLabv3+ results by 0.74% and 0.13% respectively.
more » « less
Full Text Available
Location-aware Single Image Reflection Removal

https://doi.org/10.1109/ICCV48922.2021.00497

Dong, Zheng; Xu, Ke; Yang, Yin; Bao, Hujun; Xu, Weiwei; Lau, Rynson W.H. (October 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV))

Full Text Available
Scalable image-based indoor scene rendering with reflections

https://doi.org/10.1145/3450626.3459849

Xu, Jiamin; Wu, Xiuchao; Zhu, Zihan; Huang, Qixing; Yang, Yin; Bao, Hujun; Xu, Weiwei (August 2021, ACM Transactions on Graphics)
null (Ed.)
Full Text Available
QuanTaichi: a compiler for quantized simulations

https://doi.org/10.1145/3450626.3459671

Hu, Yuanming; Liu, Jiafeng; Yang, Xuanda; Xu, Mingkuan; Kuang, Ye; Xu, Weiwei; Dai, Qiang; Freeman, William T.; Durand, Frédo (August 2021, ACM Transactions on Graphics)

High-resolution simulations can deliver great visual quality, but they are often limited by available memory, especially on GPUs. We present a compiler for physical simulation that can achieve both high performance and significantly reduced memory costs, by enabling flexible and aggressive quantization. Low-precision ("quantized") numerical data types are used and packed to represent simulation states, leading to reduced memory space and bandwidth consumption. Quantized simulation allows higher resolution simulation with less memory, which is especially attractive on GPUs. Implementing a quantized simulator that has high performance and packs the data tightly for aggressive storage reduction would be extremely labor-intensive and error-prone using a traditional programming language. To make the creation of quantized simulation practical, we have developed a new set of language abstractions and a compilation system. A suite of tailored domain-specific optimizations ensure quantized simulators often run as fast as the full-precision simulators, despite the overhead of encoding-decoding the packed quantized data types. Our programming language and compiler, based on Taichi , allow developers to effortlessly switch between different full-precision and quantized simulators, to explore the full design space of quantization schemes, and ultimately to achieve a good balance between space and precision. The creation of quantized simulation with our system has large benefits in terms of memory consumption and performance, on a variety of hardware, from mobile devices to workstations with high-end GPUs. We can simulate with levels of resolution that were previously only achievable on systems with much more memory, such as multiple GPUs. For example, on a single GPU, we can simulate a Game of Life with 20 billion cells (8× compression per pixel), an Eulerian fluid system with 421 million active voxels (1.6× compression per voxel), and a hybrid Eulerian-Lagrangian elastic object simulation with 235 million particles (1.7× compression per particle). At the same time, quantized simulations create physically plausible results. Our quantization techniques are complementary to existing acceleration approaches of physical simulation: they can be used in combination with these existing approaches, such as sparse data structures, for even higher scalability and performance.
more » « less
Full Text Available
Crystallizing Kagome Artificial Spin Ice

https://doi.org/10.1103/PhysRevLett.129.057202

Yue, Wen-Cheng; Yuan, Zixiong; Lyu, Yang-Yang; Dong, Sining; Zhou, Jian; Xiao, Zhi-Li; He, Liang; Tu, Xuecou; Dong, Ying; Wang, Huabing; et al (July 2022, Physical Review Letters)

Full Text Available

« Prev Next »

Search for: All records